Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Even better(er) binary quantization #117994

Conversation

benwtrent
Copy link
Member

This measurably improves BBQ by adjusting the underlying algorithm to an optimized per vector scalar quantization.

This is a brand new way to quantize vectors. Instead of there being a global set of upper and lower quantile bands, these are optimized and calculated per individual vector. Additionally, vectors are centered on a common centroid.

This allows for an almost 32x reduction in memory, and even better recall than before at the cost of slightly increasing indexing time.

Additionally, this new approach is easily generalizable to various other bit sizes (e.g. 2 bits, etc.). While not taken advantage of yet, we may update our scalar quantized indices in the future to use this new algorithm, giving significant boosts in recall.

The recall gains spread from 2% to almost 10% for certain datasets with an additional 5-10% indexing cost when indexing with HNSW when compared with current BBQ.

@benwtrent benwtrent added >enhancement auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search v9.0.0 v8.18.0 labels Dec 4, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

@@ -48,13 +48,8 @@ class ES816BinaryFlatVectorsScorer implements FlatVectorsScorer {
public RandomVectorScorerSupplier getRandomVectorScorerSupplier(
VectorSimilarityFunction similarityFunction,
KnnVectorValues vectorValues
) throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing write path for old codec

Comment on lines -93 to -100
RandomVectorScorerSupplier getRandomVectorScorerSupplier(
VectorSimilarityFunction similarityFunction,
ES816BinaryQuantizedVectorsWriter.OffHeapBinarizedQueryVectorValues scoringVectors,
BinarizedByteVectorValues targetVectors
) {
return new BinarizedRandomVectorScorerSupplier(scoringVectors, targetVectors, similarityFunction);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing write path for old codec

@Override
public String toString() {
return "ES816BinaryFlatVectorsScorer(nonQuantizedDelegate=" + nonQuantizedDelegate + ")";
}

/** Vector scorer supplier over binarized vector values */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing write path for old codec

return new ES816BinaryQuantizedVectorsWriter(scorer, rawVectorFormat.fieldsWriter(state), state);
throw new UnsupportedOperationException();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing write path for old codec

@@ -25,10 +25,8 @@
import org.apache.lucene.codecs.hnsw.FlatVectorsFormat;
import org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat;
import org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader;
import org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsWriter;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing write path for old codec

/**
* Copied from Lucene, replace with Lucene's implementation sometime after Lucene 10
*/
public class ES816BinaryQuantizedRWVectorsFormat extends ES816BinaryQuantizedVectorsFormat {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For old codec backward compat testing

@@ -63,7 +63,7 @@ protected Codec getCodec() {
return new Lucene100Codec() {
@Override
public KnnVectorsFormat getKnnVectorsFormatForField(String field) {
return new ES816BinaryQuantizedVectorsFormat();
return new ES816BinaryQuantizedRWVectorsFormat();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For old codec backward compat testing

@@ -77,7 +77,7 @@ class ES816BinaryQuantizedVectorsWriter extends FlatVectorsWriter {
private final List<FieldWriter> fields = new ArrayList<>();
private final IndexOutput meta, binarizedVectorData;
private final FlatVectorsWriter rawVectorDelegate;
private final ES816BinaryFlatVectorsScorer vectorsScorer;
private final ES816BinaryFlatRWVectorsScorer vectorsScorer;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For old codec backward compat testing

import static org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN;
import static org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat.DEFAULT_NUM_MERGE_WORKER;

class ES816HnswBinaryQuantizedRWVectorsFormat extends ES816HnswBinaryQuantizedVectorsFormat {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For old codec backward compat testing

@@ -59,7 +59,7 @@ protected Codec getCodec() {
return new Lucene100Codec() {
@Override
public KnnVectorsFormat getKnnVectorsFormatForField(String field) {
return new ES816HnswBinaryQuantizedVectorsFormat();
return new ES816HnswBinaryQuantizedRWVectorsFormat();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For old codec backward compat testing

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

double xe = 0.0;
double e = 0.0;
for (double xi : vector) {
double xiq = (a + step * Math.round((clamp(xi, a, b) - a) * stepInv));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, this seems about right 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I should indeed add some comments here. We are basically calculating the error of quantizing and then unquantizing and shifting up or down based on the mis-calculation.

Copy link
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed tests and the main set of changes against the 816* versions and they LGTM. The math changes are a bit more difficult to review; but will give it a more thorough go on Monday 😅 (no need to wait though; please feel free to proceed w/o waiting for my review on that part)

@mayya-sharipova
Copy link
Contributor

@benwtrent and Tom V. amazing work! It would be nice to add some documentation to the format: it looks like the queries are still 4 bits quantized?

@benwtrent
Copy link
Member Author

It would be nice to add some documentation to the format: it looks like the queries are still 4 bits quantized?

The on disk format is very similar. I can add some docs on that to the format

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwtrent Thanks Ben! Great work!

Similar to Panos, I reviewed file formats, as don't follow all the math in quantizer but I trust you and Tom got it right.

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 9, 2024
@elasticsearchmachine elasticsearchmachine merged commit 5e859d9 into elastic:main Dec 9, 2024
16 checks passed
@benwtrent benwtrent deleted the feature/even-better-binary-quantization branch December 9, 2024 16:06
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117994

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Dec 9, 2024
This measurably improves BBQ by adjusting the underlying algorithm to an
optimized per vector scalar quantization.

This is a brand new way to quantize vectors. Instead of there being a
global set of upper and lower quantile bands, these are optimized and
calculated per individual vector. Additionally, vectors are centered on
a common centroid.

This allows for an almost 32x reduction in memory, and even better
recall than before at the cost of slightly increasing indexing time.

Additionally, this new approach is easily generalizable to various other
bit sizes (e.g. 2 bits, etc.). While not taken advantage of yet, we may
update our scalar quantized indices in the future to use this new
algorithm, giving significant boosts in recall.

The recall gains spread from 2% to almost 10% for certain datasets with
an additional 5-10% indexing cost when indexing with HNSW when compared
with current BBQ.
elasticsearchmachine pushed a commit that referenced this pull request Dec 10, 2024
* Even better(er) binary quantization (#117994)

This measurably improves BBQ by adjusting the underlying algorithm to an
optimized per vector scalar quantization.

This is a brand new way to quantize vectors. Instead of there being a
global set of upper and lower quantile bands, these are optimized and
calculated per individual vector. Additionally, vectors are centered on
a common centroid.

This allows for an almost 32x reduction in memory, and even better
recall than before at the cost of slightly increasing indexing time.

Additionally, this new approach is easily generalizable to various other
bit sizes (e.g. 2 bits, etc.). While not taken advantage of yet, we may
update our scalar quantized indices in the future to use this new
algorithm, giving significant boosts in recall.

The recall gains spread from 2% to almost 10% for certain datasets with
an additional 5-10% indexing cost when indexing with HNSW when compared
with current BBQ.

* fixing backport

* fixing test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants